National Repository of Grey Literature 132 records found  1 - 10nextend  jump to record: Search took 0.02 seconds. 
Multi-Label Classification of Text Documents
Průša, Petr ; Očenášek, Pavel (referee) ; Bartík, Vladimír (advisor)
The master's thesis deals with automatic classifi cation of text document. It explains basic terms and problems of text mining. The thesis explains term clustering and shows some basic clustering algoritms. The thesis also shows some methods of classi fication and deals with matrix regression closely. Application using matrix regression for classifi cation was designed and developed. Experiments were focused on normalization and thresholding.
Text Data Clustering
Leixner, Petr ; Burgetová, Ivana (referee) ; Bartík, Vladimír (advisor)
Process of text data clustering can be used to analysis, navigation and structure large sets of texts or hypertext documents. The basic idea is to group the documents into a set of clusters on the basis of their similarity. The well-known methods of text clustering, however, do not really solve the specific problems of text clustering like high dimensionality of the input data, very large size of the databases and understandability of the cluster description. This work deals with mentioned problems and describes the modern method of text data clustering based on the use of frequent term sets, which tries to solve deficiencies of other clustering methods.
Text data clustering algorithms
Sedláček, Josef ; Burget, Radim (referee) ; Karásek, Jan (advisor)
The thesis deals with text mining. It describes the theory of text document clustering as well as algorithms used for clustering. This theory serves as a basis for developing an application for clustering text data. The application is developed in Java programming language and contains three methods used for clustering. The user can choose which method will be used for clustering the collection of documents. The implemented methods are K medoids, BiSec K medoids, and SOM (self-organization maps). The application also includes a validation set, which was specially created for the diploma thesis and it is used for testing the algorithms. Finally, the algorithms are compared according to obtained results.
DNA Microarrays Data Analysis
Hebelka, Tomáš ; Jaša, Petr (referee) ; Burgetová, Ivana (advisor)
This work concerns with data analysis of DNA microarrays by using cluster analysis. It explains biological terms - gene expression and DNA microarray. Next, it contains mathematical and informatical description of clustering methods and describes a way to apply these methods to microarrays data. Next, the work contains implementation's detail of clustering methods k-means, DBSCAN and introduces an original clustering algorithm Strom++. Then, description of implementation and application manual follow. Finally, accomplished results are evaluated.
Object Detection and Tracking Using Interest Points
Bílý, Vojtěch ; Hradiš, Michal (referee) ; Juránek, Roman (advisor)
This paper deals with object detection and tracking using iterest points. Existing approaches are described here. Inovated method based on Generalized Hough transform and iterative Hough-space searching is  proposed in this paper. Generality of proposed detector is shown in various types of objects. Object tracking is designed as frame by frame detection.
Unsupervised Evaluation of Speaker Recognition System
Odehnal, Ondřej ; Plchot, Oldřich (referee) ; Matějka, Pavel (advisor)
Tato práce je vystavěna nad moderním systémem pro rozpoznávání mluvčího (SID) založeného na x-vektorech. Cílem bakalářské práce je navrhnout a experimentálně vyhodnotit techniky pro evaluaci SID systému za použití audio nahrávek bez anotace tj. bez znalosti mluvčího. Pro tento účel je z každé nahrávky bez anotace vytvořen embedding. Ty se poté používají pro shlukování nahrávek a následné vytvoření pseudo-anotací. Na těchto anotacích se SID systém evaluuje pomocí equal error rate (EER) metriky. Za účelem vytvoření pseudo-anotací byly navrženy tyto shlukovací algoritmy učení bez učitele: K-means, Gaussian mixture models (GMM) a aglomerativní shlukování. Po testování vyšel jakožto nejlepší experimentální postup K-means se Silhouette metrikou, která používá kosinovou podobnost jako míru vzdálenosti. Nejlepší metoda dosáhla 5,72 % EER s referenčním EER = 5,15 %, které bylo spočítané se znalostí anotace na části datasetu SITW dev-core-core. Podobné výsledky byly získány na části datasetu SITW eval-core-core s odhadnutým EER = 5,86 % a referenčním 5,08 %. Rozdíl mezi hodnotami tvoří 0,57 % pro eval-core-core a 0, 78% pro dev-core-core. Další testy na NIST SRE16 a VoxCeleb1 datasetech byly provedeny za účelem ověření správnosti navrženého postupu. Obecně se dá říct, že navržený testovací postup měl chybu přibližně 1 %, což je poměrně dobrý výsledek pro algoritmus učení bez učitele.
Clustering of Biological Sequences
Kubiš, Radim ; Burgetová, Ivana (referee) ; Martínek, Tomáš (advisor)
One of the main reasons for protein clustering is prediction of structure, function and evolution. Many of current tools have disadvantage of high computational complexity due to all-to-all sequence alignment. If any tool works faster, it does not reach accuracy as other tools. Further disadvantage is processing on higher rate of similarity but homologous proteins can be similar with less identity. The process of clustering often ends when reach the condition which does not reflect sufficient quality of clusters. Master's thesis describes the design and implementation of new tool for clustering of protein sequences. New tool should not be computationally demanding but it should preserve required accuracy and produce better clusters. The thesis also describes testing of designed tool, evaluation of results and possibilities of its further development.
Knowledge Discovery in Multimedia Databases
Málik, Peter ; Bartík, Vladimír (referee) ; Chmelař, Petr (advisor)
This master"s thesis deals with the knowledge discovery in multimedia databases. It contains general principles of knowledge discovery in databases, especially methods of cluster analysis used for data mining in large and multidimensional databases are described here. The next chapter contains introduction to multimedia databases, focusing on the extraction of low level features from images and video data. The practical part is then an implementation of the methods BIRCH, DBSCAN and k-means for cluster analysis. Final part is dedicated to experiments above TRECVid 2008 dataset and description of achievements.
Image Database Query by Example
Dobrotka, Matúš ; Hradiš, Michal (referee) ; Veľas, Martin (advisor)
This thesis deals with content-based image retrieval. The objective of the thesis is to develop an application, which will compare different approaches of image retrieval. First basic approach consists of keypoints detection, local features extraction and creating a visual vocabulary by clustering algorithm - k-means. Using this visual vocabulary is computed histogram of occurrence count of visual words - Bag of Words (BoW), which globally represents an image. After applying an appropriate metrics, it follows finding similar images. Second approach uses deep convolutional neural networks (DCNN) to extract feature vectors. These vectors are used to create a visual vocabulary, which is used to calculate BoW. Next procedure is then similar to the first approach. Third approach uses extracted vectors from DCNN as BoW vectors. It is followed by applying an appropriate metrics and finding similar images. The conclusion describes mentioned approaches, experiments and the final evaluation.
Knowledge Discovery from Data - Clustering Algorithms
Kapavík, Radim ; Burgetová, Ivana (referee) ; Bartík, Vladimír (advisor)
This work deals with the theme of cluster analysis, focusing on problems of determining necessary parameters of these methods. Most of the work is dedicated to describing implementation of DENCLUE method based on density and proposing appropriate way to set up it´s key parameter, known as sigma, automatically.

National Repository of Grey Literature : 132 records found   1 - 10nextend  jump to record:
Interested in being notified about new results for this query?
Subscribe to the RSS feed.